A Fully Parallel Analog VLSI Architecture for Implementing Learning Algorithms

نویسندگان

Renyuan Zhang

Tadashi Shibata

Shuichi Sakai

Akira Hirose

Makoto Ikeda

Yoshio Mita

چکیده

The cognitive functions play very important roles in the real-world tasks such as text analysis, audio processing and visual processing. In these cognitive tasks, the human brain is much superior to traditional very large scale integrated (VLSI) processors or software programs, since the brain can learn from samples autonomously. Therefore, plenty of machine learning algorithms have been developed to realize the learning operations, which were originally implemented by the software programs. Due to the reasons of power consumption and processing performances, a number of attempts to implement the machine learning algorithms were made by using hardware including graphic processing units (GPUs), field programmable gate array (FPGA), and VLSI circuits. Since many computations in the machine learning algorithms are very complex, the implementation costs including computing time and hardware utilization are greatly concerned. Furthermore, a large amount of iterations are always required by these algorithms, the learning speed is also a critical issue. Thus, the challenge on hardware implementations of learning algorithms lies on achieving a high processing speed with the consideration of limited hardware resource. In this thesis, a fully parallel architecture for implementing learning algorithms is proposed by using analog VLSI circuits. Several analog circuitries are designed to carry out the complex functions such as Gaussian function and Euclidean distance. These computations in the learning algorithms can be done in real time within the compact chip area. On the basis of analog computational circuitries, a generally applied architecture in fully parallel is developed to implement some machine learning algorithms. Since the chaos of analog signals is used for learning instead of clock-based numerical iterations, the learning operation is accomplished autonomously and self-converges with a high speed. Furthermore, the chip area and inner connection explosion problem in the traditionally parallel architectures can be prevented. To verify the proposed architecture, the support vector machine (SVM) was implemented by VLSI circuits and fabricated in a complementary metal-oxidesemiconductor (CMOS) technology. SVM is one of most important supervised machine learning algorithms, which has been widely applied in the pattern recognition tasks. In fact, a number of VLSI implementations have been developed to realize the SVM onchip learning. Since the kernel functions in SVM theory are always expensive to carry out by using digital circuits, the analog implementations of SVM algorithm were suggested by some works. There were two problems in the previously developed works. Firstly, the traditional analog circuits applied in these works generate a highly dimensional Gaussian function through single-dimension multipliers. The error intolerably increases as the dimension increases. Therefore, these works can be hardly implemented in highly dimensional pattern classification. The second problem is the trade-off between the learning speed and the chip size. Generally, there is a trade-off between the amount of circuits and the learning speed. A high processing parallelism realizes a high speed; however, it requires a large number of circuits. The number of learning iterations, which is usually very large and does not depend on the hardware parallelism, has a marked effect on the learning speed. Therefore, conventional VLSI implementations employing clock-based iterations consume much time on these iterations indifferently to the degree of hardware parallelism. In this work, the proposed fully parallel implementation of SVM was used in the image recognition problem. An analog Gaussian generation circuit, which is robust against process variations, is developed for highly dimensional pattern vectors. The center, height, and width of the generated Gaussian function feature can all be programmed easily. Furthermore, the chip-area-hungry part for highly dimensional Euclidean distance computations and the much smaller part for exponential computation are built separately. Only the exponential computing circuits should be duplicated for a high degree of parallelism. In this manner, a fully parallel learning SVM processor was built within the compact chip area in a standard 0.18 um CMOS technology. Upon receiving highly dimensional pattern vectors, the learning process autonomously proceeded without any clock-based control and self-converged within a single clock cycle of the system (at 10MHz). To confirm the learning/classifying performance characteristics, sixteen object images from a database are converted into 64-dimensional vectors and fed into the proposed SVM processor as learning samples. After self-learning, several other vectors were used as test patterns. The proposed SVM processor classified all the testing patterns into correct classes according to the measurement results. The processing speed, chip area and power consumption performances are improved compared with the traditional approaches. As a generally applied methodology, the proposed fully parallel architecture is also used to implement the unsupervised machine learning algorithms. On the basis of K-means mechanism, which is an important pattern clustering algorithm, a hardware efficient version is developed and named as K-Quasi-Centers (KQCs) method. From viewpoint of clustering results, the suggested scheme of clustering method has similar convergence performance to the original K-means algorithm. By implementing this modified clustering algorithm, the proposed analog fully parallel architecture can be applied to solve the unsupervised pattern clustering problem. The proof-of-concept processor was designed for 64-dimensional vectors categorization. In order to verify the performances of the proposed processor, sixteen images of two kinds of objects selected from the real image database were converted into feature vectors and fed into our KQC clustering processor. According to the circuit simulation results, all the images were correctly categorized into their respective classes even with several different random initializations, and the categorization results self-converged with higher speed than conventional approaches. From the above image processing applications, the proposed architecture performs a high processing speed and acceptable accuracy. However, the processing capacity of VLSI implementations is seriously limited by the chip size. One of the reasonable solutions to increase the number of learning samples is applying the on-line learning strategy, which was originally developed by software programs. In this work, the efficiency and importance of each learning sample are evaluated after the learning operation. The most inefficient sample is discarded to make the learning processor accept a new sample on-line. Employing the updated samples, the learning operation is repeated again. In this manner, a fixed VLSI processor can be used for the learning operation of a very large scale even unpredictable sample space. However, since on-line learning results in a large number of machine learning operations, this strategy is difficult to realize using software or traditional VLSI processors. Employing the proposed fully parallel architecture, the learning operations are accomplished with a high speed. Thus, this on-line learning strategy is efficient for the proposed architecture particularly. In order to verify the on-line learning performances, both SVM and KQC algorithms were implemented employing the analog fully parallel architecture for the image classification and clustering problems, respectively. From the circuit simulation results, the learning results are all correct with the consideration of on-line received samples. Furthermore, a visual tracking system was built by the combination of FPGA boards and the analog SVM processor developed by this work. Employing the on-line learning SVM, the object tracking performances were improved compared with those of conventional approaches. Besides the pattern classification and clustering problems, another important task of machine learning is called data domain description. It was found that the data domain description has an enhanced capacity for pattern recognitions. For instance, the SVM classification algorithm is originally for the two-class classification problems; but in the real-world applications, various numbers of classes might be required, even only a single class of learning samples is available in some applications. To solve these problems, a data domain description theory (also called one-class classification) was developed as an extension of SVM theory, which is named support vector domain description (SVDD). The SVDD algorithm has been applied in some classification problems, even unsupervised clustering problems by software programs. In this work, the SVDD algorithm has been implemented by our proposed analog fully parallel architecture. The proof-of-concept chip was built for the 64-dimensional pattern recognition. A multiple chip topology was proposed for multi-class recognition problems. For expending the classes, the number of chips can be freely increased. As an example, a three-class classification system employing three SVDD chips was built for real image recognition. After the on-chip learning session, several test images were fed in the system. From the chip measurement results, all the test patterns were correctly recognized. As the extension of analog VLSI implementations for soft-computing tasks, we discuss how CMOS supporting circuitries can interface the fabric of nano devices with digital computing world. Using CMOS ring oscillators to emulate the nano oscillator behavior, how to produce the associative memory function and to use it for image recognition is demonstrated by circuit simulation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compressive Computation in Analog VLSI Motion Sensors

We introduce several di erent focal plane analog VLSI motion sensors developed in the past. We show how their pixel-parallel architecture can be used to extract low-dimensional information from a higher dimensional data set. As an example we present an algorithm and corresponding experiments to compute the focus of expansion, focus of contraction and the axis of rotation from natural visual inp...

متن کامل

Vlsi Delta-sigma Cellular Neural Network for Analog Random Vector Generation

We present a cellular neural network architecture for parallel analog random vector generation, including experimental results from an analog VLSI prototype with 64 channels. Nearest-neighbor coupling between cells produces parallel channels of uniformly distributed random analog values, with statistics that are truly uncorrelated across channels and over time. The cell for each random channel ...

متن کامل

Weight Perturbation: An Optimal Architecture and Learning Technique for Analog VLSI Feedforward and Recurrent Multilayer Networks

Previous work on analog VLSI implementation of multilayer perceptrons with on-chip learning has mainly targeted the implementation of algorithms such as back-propagation. Although back-propagation is efficient, its implementation in analog VLSI requires excessive computational hardware. It is shown that using gradient descent with direct approximation of the gradient instead of back-propagation...

متن کامل

Design and Non-linear Modelling of CMOS Multipliers for Analog VLSI Implementation of Neural Algorithms

The analog VLSI implementation looks an attractive way for implementing Artiicial Neural Networks; in fact, it gives small area, low power consumption and compact design of neural computational primitive circuits. On the other hand, major drawbacks result to be the low computational accuracy and the non-linear behaviour of analog circuits. In this paper, we present the design and the detailed b...

متن کامل

A VLSI neuromorphic device for implementing spike-based neural networks

We present a neuromorphic VLSI device which comprises hybrid analog/digital circuits for implementing networks of spiking neurons. Each neuron integrates input currents from a row of multiple analog synaptic circuit. The synapses integrate incoming spikes, and produce output currents which have temporal dynamics analogous to those of biological post synaptic currents. The VLSI device can be use...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

A Fully Parallel Analog VLSI Architecture for Implementing Learning Algorithms

نویسندگان

چکیده

منابع مشابه

Compressive Computation in Analog VLSI Motion Sensors

Vlsi Delta-sigma Cellular Neural Network for Analog Random Vector Generation

Weight Perturbation: An Optimal Architecture and Learning Technique for Analog VLSI Feedforward and Recurrent Multilayer Networks

Design and Non-linear Modelling of CMOS Multipliers for Analog VLSI Implementation of Neural Algorithms

A VLSI neuromorphic device for implementing spike-based neural networks

عنوان ژورنال:

اشتراک گذاری